**********Generate consolidated variables -01***************
clear all

**********
*1. KNC
**********
*In the KNC data, there is a variable named "ik", which indicates if the parent company is the largest shareholder of the group, if yes, then it is marked as "k". Each firm observation has a group identifier (PeOrgNr_K) and its own identifier/bidnr (PeOrgNr_D). We drop all the data with "ik" != "k". 

*Get the group-level identifier by KNC dataset
use "kth_knc_201907.dta", clear
drop if ik == "I"
codebook PeOrgNrK_LopNr
*158,195 unique group identifiers
codebook PeOrgNrD_LopNr
*438,699 unique bidnrs (firm identifiers - organization numbers)
gen Ar = substr(kboksl,1,4)
bysort PeOrgNrK_LopNr: egen check_id = nvals(PeOrgNrD_LopNr)
codebook check_id
table Ar

* We generate the summary statistics of our data sample. From 1994 to 2017, the KNC data comprises a total of 438,699 unique PeOrgNr_D(bidnrs), and 158,195 group identifiers. However, in 1994 we only have 11 valid observations. After 1995, the number of observations increased steadily by years until it fell to around 30,000 observations in 2017. Because too few obs may effect the reliability of the regression results, we set 1996 as the start year and replace all the consolidated financial variables which showed before 1996 as missings in the future steps. 

order PeOrgNrD_LopNr PeOrgNrK_LopNr
rename PeOrgNrD_LopNr bidnr
*We check if the bidnr is unique within each year, the duplicates indicate that in different months within a certain year, the bidnrs are different
duplicates tag bidnr Ar, gen(id1)
gen month = substr(kboksl,5,6)
* If there are more than 1 organization numbers within a year, we keep the one at the year-end
destring month,replace
bysort bidnr:egen month_m = max(month)
drop if month != month_m & id1 != 0
destring Ar, replace
* Generate the first year for each bidnr
bysort PeOrgNrK_LopNr: egen f_y = min(Ar)
duplicates tag bidnr Ar, gen(id_n)
codebook id_n
drop id1 month month_m id_n check_id
rename Ar year
* We merge the group identifiers with the master dataset
merge 1:m bidnr year using  "master_file_with_fuels_noemissions_w_inv.dta" 

*merge 1:m bidnr year using  "\\micro.intra\projekt\P0789$\P0789_Gem\Gustav\Material for inspection\magnitude_tax_20220522_noemission_req.dta" 
drop founded
save "noemission_req_KNC_merged.dta" ,replace


*************
*2. Financial variables
*************
use "noemission_req_KNC_merged.dta" ,replace
*Keep accounting vars 
keep bidnr PeOrgNrK_LopNr PeOrgNrM_LopNr ik kboksl landkod proc antanst depreciation_all ntoms_12month total_current_assets cash_sec sales_ppi_12month sales_defl_12month total_debt orgnrk_foreign orgnrm_foreign orgnrd_foreign year _merge f_y  dividend ta defl_2010 cash_flow

order PeOrgNrK_LopNr bidnr year
gsort PeOrgNrK_LopNr year

* Currently we have a panel dataset, with cross-section firms(PeOrgNr_D) and years. The firms who share the same PeOrgNr_K are considered as a group. After merging with the emissions dataset, there are some unmatched observations. The main reason could be the differences between the time period. There are also some firms only show in the emissions' dataset, not in corporate structures' dataset (knc). In this case, we replace their group identifiers by their organization numbers and consider them as "stand-alone firms", which means their consolidated values will be independent (as their original financial values. 

*Similarly, we identify a bidnr as stand-alone when the parent identifier K is discontinuous or disappeared. (For example, if a firm with organization number 93 has the PeOrgNr_K 84 from 2001 to 2004 and 2007 to 2009,  while in other years the group identifiers are missing, then between 2005 to 2006 it will be a stand-alone firm with PeOrgNr_K as its bidnr 93.) 


*The dummy variable sf means if the company is considered as a stand-alone firm
gen sf = 1 if PeOrgNrK_LopNr ==.
replace sf = 0 if sf ==.

*If there is no information in the knc file, we replace the organization number as the its PeOrgNrK_LopNr and consider it as a stand-alone firm
replace PeOrgNrK_LopNr = bidnr if PeOrgNrK_LopNr ==.
bysort PeOrgNrK_LopNr: egen first_y = min(year)
bysort PeOrgNrK_LopNr: egen f_y1 = max(f_y)

* Generate a variable indicate how many bidnrs share the same PeOrgNrK_LopNr at year level
bysort PeOrgNrK_LopNr year: egen c_uniq = nvals(bidnr)
codebook c_uniq
order PeOrgNrK_LopNr bidnr year antanst-ntoms_12month sales_ppi_12month-sales_defl_12month cash_flow

*Replace the deflation factor (base year 2010) for the firms who share the same PeOrgNrK_LopNr
bysort PeOrgNrK_LopNr year: egen defl_2010_m = max(defl_2010)

*Generate the CPI-adjusted financial variables
foreach i of varlist depreciation_all-total_debt cash_flow{
bysort bidnr year: gen `i'_defl = `i'/ defl_2010_m
}
order PeOrgNrK_LopNr bidnr year antanst-sales_defl_12month depreciation_all_defl-total_debt_defl cash_flow_defl

*Generate the consolidated vars, for the groups, the financial variables are aggregated. If the firm is stand-alone, then the consolidated values will be its own data. Before 1996, we replace all the consolidated variables as missing
foreach i of varlist antanst-total_debt_defl cash_flow_defl{
	bysort PeOrgNrK_LopNr year: egen con_`i' = total(`i') if sf == 0 
	bysort PeOrgNrK_LopNr year: replace con_`i' = `i' if sf == 1 
	bysort PeOrgNrK_LopNr year: replace con_`i' =. if (con_`i' == 0 &`i' == .) | (year < 1996)
} 

order PeOrgNrK_LopNr bidnr year kboksl antanst-total_debt_defl cash_flow_defl defl_2010_m-con_total_debt_defl con_cash_flow_defl sf ik
gsort PeOrgNrK_LopNr year

label variable PeOrgNrK_LopNr "Basic OrgN"
label variable f_y1 "First year of each bidnr in KNC"
label variable first_y "First year of each bidnr in non-req emissions"
label variable sf "Stand-alone firms, if yes then 1"
label variable c_uniq "N of daughters by OrgN_K and year"
drop  _merge PeOrgNrM_LopNr f_y
codebook bidnr
save "emissions_req_KNC_final.dta",replace

* Merge with the master emissions' dataset
*merge 1:1 bidnr year using "\\micro.intra\projekt\P0789$\P0789_Gem\Gustav\Material for inspection\magnitude_tax_20220522_emission_req.dta" 
merge 1:1 bidnr year using "master_file_with_fuels_emissions_w_inv.dta" 

keep if _merge == 3
drop _merge
sort bidnr year
save "emissions_req_KNC.dta",replace

*Merge with the credit rating dataset
use "uc_credit_score.dta" ,replace
rename peorgnr_LopnNr bidnr
duplicates drop bidnr period riskklass, force
drop year
gen year = substr(period,1,4)
destring year,replace
merge 1:1 bidnr year using "emissions_req_KNC.dta"
codebook year if _merge == 2
* The period of credit score is from 2007 to 2016
drop if _merge == 1
drop _merge c_uniq 
order PeOrgNrK_LopNr year
gsort PeOrgNrK_LopNr year
drop founded
save "emissions_req_KNC_credit.dta",replace










